Handling of duplicate keys in objects #85

kutsurak · 2023-12-15T13:52:02Z

Here is the promised pull request for the handling of duplicate keys. This is just a linear pass over the parsed object keeping the last value if a key has been seen before. As I mentioned in #84 this will probably have a performance impact if the objects are large, but I have not tested. I also added a few tests that make sure that this change behaves reasonably.

As a general conclusion I think that this change does not play very well with the ordered concept in the library (see the expected values of the two tests I added), but I do believe this is the best we can do with respect to correctness.

For performance we might be able to use some more efficient data structure (a hash table comes to mind) but this would probably require more extensive changes in the parser.

This addresses aconchillo#84

aconchillo · 2023-12-17T01:11:44Z

Hi @kutsurak. I actually implemented this using a hash table which should fix the performance issues. I took the liberty to get a couple of lines of your unit tests, the other line was not necessary because since we are now using a hash table we won't be able to guarantee the order. Let me know what you think. Thank you!!!! 🙏

aconchillo · 2023-12-17T01:12:11Z

Oooops, forgot to mention #86.

aconchillo · 2023-12-17T01:46:49Z

I tested this PR with a large file and got these results:

scheme@(guile-user)> ,t (->bool (call-with-input-file "large-file.json" (lambda (port) (json->scm port))))
$8 = #t
;; 1.261434s real time, 1.657068s run time.  0.465849s spent in GC.

So it doesn't have to have a big impact, which is nice. Actually, it seems to behave better than the hash table approach (see #86 (comment)). I need to test bigger files since the one I tested was an array with objects in it and the objects were not too big. So, the hash table approach would probably be better for really big objects, but I don't think that's usually the case.

kutsurak · 2023-12-19T08:35:46Z

Hello @aconchillo

Yes I agree that for large objects a hash map will have better performance, but for small objects the setup overhead becomes expensive. I am fine with either solution, or even with a combination of the two where there is a keyword argument to allow users to switch between the recursive and the hash table solutions depending on their data.

Let me know if you want me to do some more work on this.

aconchillo · 2023-12-20T19:20:01Z

json/parser.scm

+  (cond ((null? pairs) res)
+        ((assoc (caar pairs) res)
+         (uniquify-keys (cdr pairs) res))
+        (#t (uniquify-keys (cdr pairs) (cons (car pairs) res)))))


you can use else here instead of #t, i think it's a bit more idiomatic.

aconchillo · 2023-12-20T19:20:37Z

Hello @aconchillo

Yes I agree that for large objects a hash map will have better performance, but for small objects the setup overhead becomes expensive. I am fine with either solution, or even with a combination of the two where there is a keyword argument to allow users to switch between the recursive and the hash table solutions depending on their data.

Let me know if you want me to do some more work on this.

Just added one tiny comment. After that I will merge it. Thank you!

kutsurak added 3 commits December 13, 2023 09:49

Remove duplicate keys in parsed objects

f0e2fa2

This addresses aconchillo#84

Add a few tests

a6d4e29

Handle correctly duplicate keys

64e3a40

aconchillo reviewed Dec 20, 2023

View reviewed changes

aconchillo mentioned this pull request Dec 20, 2023

support for handling duplicate keys #86

Closed

Use the idiomatic else in cond form

b62f8d7

aconchillo merged commit 05ce74c into aconchillo:master Dec 21, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of duplicate keys in objects #85

Handling of duplicate keys in objects #85

kutsurak commented Dec 15, 2023

aconchillo commented Dec 17, 2023

aconchillo commented Dec 17, 2023

aconchillo commented Dec 17, 2023

kutsurak commented Dec 19, 2023

aconchillo Dec 20, 2023

aconchillo commented Dec 20, 2023

Handling of duplicate keys in objects #85

Handling of duplicate keys in objects #85

Conversation

kutsurak commented Dec 15, 2023

aconchillo commented Dec 17, 2023

aconchillo commented Dec 17, 2023

aconchillo commented Dec 17, 2023

kutsurak commented Dec 19, 2023

aconchillo Dec 20, 2023

Choose a reason for hiding this comment

aconchillo commented Dec 20, 2023